[SPARK-57103][FOLLOWUP][SQL][TESTS] Add test coverage for MIN/MAX over nanosecond-precision timestamp types#56592
[SPARK-57103][FOLLOWUP][SQL][TESTS] Add test coverage for MIN/MAX over nanosecond-precision timestamp types#56592stevomitric wants to merge 4 commits into
Conversation
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 1 non-blocking, 1 nit.
Clean, well-targeted tests-only PR. I verified the production claims: Max/Min gate on orderability only and dataType = child.dataType, and findWiderDateTimeType genuinely has no nanos arm, so mixed-precision is correctly out of scope (SPARK-57454). Test data is meaningful (sub-microsecond-distinct values would catch micros-truncation; the group-by test asserts no-collapse). Both findings are about test placement / a stale suite doc, neither blocks merge.
Design / architecture (1)
- TimestampNanosFunctionsSuiteBase.scala:243: MIN/MAX aggregate tests live in a functions-scoped suite, whereas the cited TimeType analogue (SPARK-52626/52660) placed its aggregate tests in
DataFrameAggregateSuite— see inline
Nits: 1 minor item (stale class Scaladoc, folded into the same inline comment).
| } | ||
| } | ||
|
|
||
| // ===== MIN / MAX aggregates over nanosecond-precision timestamps (SPARK-56822) ===== |
There was a problem hiding this comment.
These are MIN/MAX aggregate tests, but they land in a suite whose class Scaladoc (lines 28-33) scopes it to "the hour, minute and second functions". The TimeType precedent this PR explicitly mirrors put its analogous aggregate tests in DataFrameAggregateSuite (SPARK-52626 group-by at :3713, SPARK-52660 codegen-split max/min at :3724) — the dedicated aggregate suite. Consider co-locating these there (or in a dedicated nanos-aggregate suite) so aggregate-over-temporal-type coverage stays discoverable in one place.
There's a legitimate pull toward keeping them here — this suite gives the ANSI on/off matrix, the nanos preview-flag session, and the fixed session timezone for free — so this is a judgment call, not a defect.
Minor, regardless of where they end up: the class Scaladoc is now stale. It still says the suite tests only hour/minute/second and that "Each test exercises both the SQL path (selectExpr) and the Scala Column API (functions.hour / minute / second)" — no longer true now that aggregate tests are present (several use only selectExpr). Please update it (or move the tests).
There was a problem hiding this comment.
Kept them here for the free ANSI on/off matrix, preview-flag session, and fixed timezone. Updated the class Scaladoc.
…N/MAX aggregate tests Addresses review feedback on apache#56592: the class Scaladoc for TimestampNanosFunctionsSuiteBase still described only the hour/minute/second functions and claimed every test exercises both the SQL and Column API paths, no longer accurate now that the MIN/MAX aggregate tests are present. Rewrote it to cover the datetime functions and the MIN/MAX aggregates and to note the ANSI on/off subclasses. Tests-only; no production change. Co-authored-by: Isaac
|
cc @MaxGekk PTAL. |
|
+1, LGTM. Merging to master/4.x. |
|
@stevomitric SPARK-57502 is wrong. Please, set correct one. |
…AX nanos tests
The MIN/MAX-over-nanos test names and golden-file comments were tagged with a
placeholder ticket, SPARK-57502, which is in fact an unrelated INFRA issue
("Fix npm vulnerabilities in ui-test and dev"). Rename the references to
SPARK-57103 ("Add ordering, compare, and hash for nanosecond timestamp types"),
the sub-task of SPARK-56822 that enabled MIN/MAX over the nanosecond types.
Tests-only; no production or behavior change.
Co-authored-by: Isaac
You are right. I set the SPARK-57103 which introduced comparisons which enabled the min/max expressions. |
|
@stevomitric Could you resolve conflicts, please. |
Resolve conflicts against SPARK-57528 (unix_timestamp over nanos), which edited
the same nanosecond-timestamp test files:
- TimestampNanosFunctionsSuiteBase.scala: keep both the SPARK-57103 MIN/MAX
aggregate tests and the SPARK-57528 unix_timestamp tests.
- timestamp-{ntz,ltz}-nanos.sql: keep both query sections.
- results/ and analyzer-results/ golden files regenerated.
Co-authored-by: Isaac
uros-b
left a comment
There was a problem hiding this comment.
Thank you @stevomitric and @MaxGekk!
|
All tests passed https://github.com/stevomitric/spark/runs/82312660388 +1, LGTM. Merging to master/4.x. |
… nanosecond-precision timestamp types ### What changes were proposed in this pull request? This PR adds end-to-end, Catalyst-level, and golden-file test coverage for the `MIN` and `MAX` aggregates (and the closely related `min_by` / `max_by` / `greatest` / `least`) over the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` (`p ∈ [7, 9]`), part of the nanosecond timestamp preview (SPARK-56822). No production code is changed. `Max`/`Min` are type-agnostic `DeclarativeAggregate`s whose only type gate is `TypeUtils.checkForOrderingExpr`, which already returns `TypeCheckSuccess` for the nanosecond types because they are orderable `AtomicType`s `Comparable` were wired in SPARK-57103, the physical value is UnsafeRow-mutable (SPARK-56981), and `CodeGenerator.genComp` already has a dedicated `case _: AnyTimestampNanoType => compareTo` arm. Since `dataType = child.dataType`, the result preserves the input precision. This mirrors the TimeType precedent (SPARK-52626 / SPARK-52660), where becoming orderable made `MIN`/`MAX` work and the remaining work was test coverage. Added tests: - `TimestampNanosFunctionsSuiteBase` (runs under ANSI on and off): precision/type preservation; sub-microsecond ordering on both the codegen and interpreted paths (two values sharing the same `epochMicros` but differing within the microsecond); all-NULL and empty input → NULL; `GROUP BY` a nanosecond key whose sub-microsecond remainder must keep distinct keys from collapsing; `min_by`/`max_by`/`greatest`/`least` over same-precision columns; and equivalence with the microsecond path when the sub-microsecond digits are zero. - `ExpressionTypeCheckingSuite`: asserts `Max`/`Min` `checkInputDataTypes()` succeeds for `TimestampNTZNanosType` / `TimestampLTZNanosType` and that the result type preserves the precision. - `timestamp-ntz-nanos.sql` / `timestamp-ltz-nanos.sql`: `MAX`/`MIN` and `GROUP BY` queries, with regenerated `results/` and `analyzer-results/` golden files. Mixed-precision inputs (e.g. `greatest(ts_p7, ts_p9)`, a `UNION` of two precisions, or comparing an aggregate against a micros literal) route through `findWiderDateTimeType`, which has no nanosecond arm yet, so they are intentionally **out of scope** here and tracked by SPARK-57454; every column in these tests is strictly same-precision. ### Why are the changes needed? `MIN`/`MAX` over nanosecond timestamps are a core, expected operation, but the nanosecond preview had no aggregate test coverage. ### Does this PR introduce any user-facing change? No. Tests only; ### How was this patch tested? New tests in this PR. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Closes #56592 from stevomitric/stevomitric/min-max. Authored-by: Stevo Mitric <stevomitric2000@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 0d36e25) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR adds end-to-end, Catalyst-level, and golden-file test coverage for the
MINandMAXaggregates (and the closely relatedmin_by/max_by/greatest/least) over the nanosecond-precision timestamp typesTIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)(p ∈ [7, 9]), part of the nanosecond timestamp preview (SPARK-56822).No production code is changed.
Max/Minare type-agnosticDeclarativeAggregates whose only type gate isTypeUtils.checkForOrderingExpr, which already returnsTypeCheckSuccessfor the nanosecond types because they are orderableAtomicTypesComparablewere wired in SPARK-57103, the physical value is UnsafeRow-mutable (SPARK-56981), andCodeGenerator.genCompalready has a dedicatedcase _: AnyTimestampNanoType => compareToarm. SincedataType = child.dataType, the result preserves the input precision. This mirrors the TimeType precedent (SPARK-52626 / SPARK-52660), where becoming orderable madeMIN/MAXwork and the remaining work was test coverage.Added tests:
TimestampNanosFunctionsSuiteBase(runs under ANSI on and off): precision/type preservation; sub-microsecond ordering on both the codegen and interpreted paths (two values sharing the sameepochMicrosbut differing within the microsecond); all-NULL and empty input → NULL;GROUP BYa nanosecond key whose sub-microsecond remainder must keep distinct keys from collapsing;min_by/max_by/greatest/leastover same-precision columns; and equivalence with the microsecond path when the sub-microsecond digits are zero.ExpressionTypeCheckingSuite: assertsMax/MincheckInputDataTypes()succeeds forTimestampNTZNanosType/TimestampLTZNanosTypeand that the result type preserves the precision.timestamp-ntz-nanos.sql/timestamp-ltz-nanos.sql:MAX/MINandGROUP BYqueries, with regeneratedresults/andanalyzer-results/golden files.Mixed-precision inputs (e.g.
greatest(ts_p7, ts_p9), aUNIONof two precisions, or comparing an aggregate against a micros literal) route throughfindWiderDateTimeType, which has no nanosecond arm yet, so they are intentionally out of scope here and tracked by SPARK-57454; every column in these tests is strictly same-precision.Why are the changes needed?
MIN/MAXover nanosecond timestamps are a core, expected operation, but the nanosecond preview had no aggregate test coverage.Does this PR introduce any user-facing change?
No. Tests only;
How was this patch tested?
New tests in this PR.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor